摘要 为了进一步提升Takagi-Sugeno-Kang(TSK)模糊分类器在不平衡数据集上的泛化能力和保持其较好的语义可解释性,受集成学习的启发,提出面向不平衡数据的深度TSK模糊分类器(A Deep TSK Fuzzy Classifier for Imbalanced Data, ID-TSK-FC).ID-TSK-FC主要由一个不平衡全局线性回归子分类器(Imbalanced Global Linear Regression Sub-Classifier, IGLRc)和多个不平衡TSK模糊子分类器(Imbalanced TSK Fuzzy Sub-Classifier, I-TSK-FC)组成.根据人类“从全局粗糙到局部精细”的认知行为和栈式叠加泛化原理,ID-TSK-FC首先在所有原始训练样本上训练一个IGLRc,获得全局粗糙的分类结果.然后根据IGLRc的输出,识别原始训练样本中的非线性分布训练样本.在非线性分布训练样本上,以栈式深度结构生成多个局部I-TSK-FC,获得局部精细的结果.最后,对于栈式堆叠IGLRc和所有I-TSK-FC的输出,使用基于最小距离投票原理,得到ID-TSK-FC的最终输出.实验表明,ID-TSK-FC不仅具有基于特征重要性的可解释性,而且具有至少相当的泛化性能和语义可解释性.
Abstract:Inspired by ensemble learning, a deep Takagi-Sugeno-Kang fuzzy classifier for imbalanced data(ID-TSK-FC) is proposed to enhance the generalization capability and maintain good linguistic interpretability of TSK fuzzy classifier on imbalanced data. ID-TSK-FC is composed of an imbalanced global linear regression sub-classifier(IGLRc) and several imbalanced TSK fuzzy sub-classifiers(I-TSK-FCs). According to the human cognitive behavior "from wholly coarse to locally fine" and the stacked generalization principle, ID-TSK-FC firstly trains an IGLRc on all training samples to obtain a wholly coarse classification result. Then, the nonlinear training samples in the original training samples are classified according to the output of IGLRc. Next, several I-TSK-FCs are generated using a stacked depth structure on the nonlinear training samples to achieve a locally fine result. Finally, the minimum distance voting principle is applied on the outputs of stacked IGLRc and all I-TSK-FCs to obtain the final output of ID-TSK-FC. The experimental results confirm that ID-TSK-FC not only holds interpretability based on feature importance, but also holds at least comparable generalization capability and linguistic interpretability.
[1] 阳爱民,胡运发.一种核模糊分类器的规则生成方法.模式识别与人工智能, 2006, 19(2): 196-202. (YANG A M, HU Y F. A Method of Generating Rules with a Kernel Fuzzy Classifier. Pattern Recognition and Artificial Intelligence, 2006, 19(2): 196-202.) [2] 周塔,邓赵红,蒋亦樟,等.基于训练空间重构的多模块TSK模糊系统.软件学报, 2020, 31(11): 3506-3518. (ZHOU T, DENG Z H, JIANG Y Z, et al. Multi-module TSK Fuzzy System Based on Training Space Recognition. Journal of Software, 2020, 31(11): 3506-3518.) [3] RAMENTOL E, VLUYMANS S, VERBIEST N, et al. IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification. IEEE Transactions on Fuzzy Systems, 2015, 23(5): 1622-1637. [4] ZHANG X Y, WEI X, LI F L, et al. Fuzzy Support Vector Machine with Imbalanced Regulator and Its Application in Stroke Classification // Proc of the IEEE 5th International Conference on Big Data Computing Service and Applications. Washington, USA: IEEE, 2019: 290-295. [5] XU L, CHOW M Y, TAYLOR L S. Data Mining Based Fuzzy Cla-ssification Algorithm for Imbalanced Data // Proc of the IEEE International Conference on Fuzzy Systems. Washington, USA: IEEE, 2006: 825-830. [6] VILLAR P, FERNÁNDEZ A, HERRERA F. Studying the Behavior of a Multiobjective Genetic Algorithm to Design Fuzzy Rule-Based Classification Systems for Imbalanced Data-Sets // Proc of the IEEE International Conference on Fuzzy Systems. Washington, USA: IEEE, 2011: 1239-1246. [7] MAHDIZADEH M, EFTEKHARI M. Designing Fuzzy Imbalanced Classifier Based on the Subtractive Clustering and Genetic Progra-mming // Proc of the 13th Iranian Conference on Fuzzy Systems. Washington, USA: IEEE, 2013. DOI: 10.1109/IFSC.2013.6675611. [8] LESKI J M, CZABAŃSKI R, JEZEWSKI M, et al. Fuzzy Ordered C-means Clustering and Least Angle Regression for Fuzzy Rule-Based Classifier: Study for Imbalanced Data. IEEE Transactions on Fuzzy Systems, 2020, 28(11): 2799-2813. [9] SANZ J A, BERNARDO D, HERRERA F, et al. A Compact Evolutionary Interval-Valued Fuzzy Rule-Based Classification System for the Modeling and Prediction of Real-World Financial Applications with Imbalanced Data. IEEE Transactions on Fuzzy Systems, 2015, 23(4): 973-990. [10] GU X Q, CHUNG F L, ISHIBUCHI H, et al. Imbalanced TSK Fuzzy Classifier by Cross-Class Bayesian Fuzzy Clustering and Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics(Systems), 2017, 47(8): 2005-2020. [11] DUAN S Y, YU S J, PRÍNCIPE J C. Modularizing Deep Learning via Pairwise Learning with Kernels. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(4): 1441-1451. [12] ZHU T Y, LI K Z, HERRERO P, et al. Deep Learning for Diabetes: A Systematic Review. IEEE Journal of Biomedical and Health Informatics, 2021, 25(7): 2744-2757. [13] MAHMUD M, KAISER M S, HUSSAIN A, et al. Applications of Deep Learning and Reinforcement Learning to Biological Data. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2063-2079. [14] WANG G J, ZHOU T, CHOI K S, et al. A Deep-Ensemble-Level-Based Interpretable Takagi-Sugeno-Kang Fuzzy Classifier for Imba-lanced Data. IEEE Transactions on Cybernetics, 2022, 52(5): 3805-3818. [15] SIGLETOS G, PALIOURAS G, SPYROPOULOS C D, et al. Combining Information Extraction Systems Using Voting and Stacked Generalization. Journal of Machine Learning Research, 2005, 6: 1751-1782. [16] WANG S T, CHUNG F L, WU J, et al. Least Learning Machine and Its Experimental Studies on Regression Capability. Applied Soft Computing, 2014, 21: 677-684. [17] WANG S T, JIANG Y Z, CHUNG F L, et al. Feedforward Kernel Neural Networks, Generalized Least Learning Machine, and Its Deep Learning with Application to Image Classification. Applied Soft Computing, 2015, 37: 125-141. [18] QIN B, CHUNG F L, WANG S T. Biologically Plausible Fuzzy-Knowledge-Out and Its Induced Wide Learning of Interpretable TSK Fuzzy Classifiers. IEEE Transactions on Fuzzy Systems, 2020, 28(7): 1276-1290. [19] GU S H, CHUNG F L, WANG S T. A Novel Deep Fuzzy Classifier by Stacking Adversarial Interpretable TSK Fuzzy Sub-Classifiers with Smooth Gradient Information. IEEE Transactions on Fuzzy Systems, 2020, 28(7): 1369-1382. [20] ZHANG Y P, ISHIBUCHI H, WANG S T. Deep Takagi-Sugeno-Kang Fuzzy Classifier with Shared Linguistic Fuzzy Rules. IEEE Transactions on Fuzzy Systems, 2018, 26(3): 1535-1549. [21] QIN B, CHUNG F L, WANG S T. KAT: A Knowledge Adversa-rial Training Method for Zero-Order Takagi-Sugeno-Kang Fuzzy Classifiers. IEEE Transactions on Cybernetics, 2022, 52(7): 6857-6871. [22] LICHTENBERG J M, ŞIMŞEK Ö. Simple Regression Models. Proceedings of Machine Learning Research, 2017, 58: 13-25. [23] KÖNIG G, MOLNAR C, BISCHL B, et al. Relative Feature Importance // Proc of the 25th International Conference on Pattern Recognition. Washington, USA: IEEE, 2020: 9318-9325. [24] YE H S, LUO L, ZHANG Z H. Accelerated Proximal Subsampled Newton Method. IEEE Transactions on Neural Networks and Lear-ning Systems, 2020, 32(10): 4374-4388. [25] ROOSTA-KHORASANI F, MAHONEY M W. Sub-Sampled Newton Methods. Mathematical Programming, 2019, 174(1): 293-326. [26] KIM D. Accelerated Proximal Point Method for Maximally Mono-tone Operators. Mathematical Programming, 2021, 190(1/2): 57-87. [27] ZHOU T, CHUNG F L, WANG S T. Deep TSK Fuzzy Classifier with Stacked Generalization and Triplely Concise Interpretability Guarantee for Large Data. IEEE Transactions on Fuzzy Systems, 2017, 25(5): 1207-1221. [28] GACTO M J, ALCALÁ R, HERRERA F. Interpretability of Lin-guistic Fuzzy Rule-Based Systems: An Overview of Interpretability Measures. Information Sciences, 2011, 181(20): 4340-4360. [29] DEMŠAR J. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 2006, 7: 1-30.